Overview

Dataset statistics

Number of variables21
Number of observations1017209
Missing cells2173431
Missing cells (%)10.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory163.0 MiB
Average record size in memory168.0 B

Variable types

Numeric11
Categorical10

Alerts

Date has a high cardinality: 942 distinct values High cardinality
DayOfWeek is highly correlated with OpenHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with DayOfWeek and 2 other fieldsHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with DayOfWeek and 2 other fieldsHigh correlation
Sales is highly correlated with Customers and 1 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with Sales and 1 other fieldsHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
PromoInterval is highly correlated with Promo2High correlation
StoreType is highly correlated with AssortmentHigh correlation
Promo2 is highly correlated with PromoIntervalHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Sales is highly correlated with Customers and 2 other fieldsHigh correlation
Customers is highly correlated with Sales and 1 other fieldsHigh correlation
Open is highly correlated with DayOfWeek and 2 other fieldsHigh correlation
Promo is highly correlated with SalesHigh correlation
StateHoliday is highly correlated with OpenHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with Customers and 1 other fieldsHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
CompetitionOpenSinceMonth has 323348 (31.8%) missing values Missing
CompetitionOpenSinceYear has 323348 (31.8%) missing values Missing
Promo2SinceWeek has 508031 (49.9%) missing values Missing
Promo2SinceYear has 508031 (49.9%) missing values Missing
PromoInterval has 508031 (49.9%) missing values Missing
DayOfWeek has 144730 (14.2%) zeros Zeros
Sales has 172871 (17.0%) zeros Zeros
Customers has 172869 (17.0%) zeros Zeros

Reproduction

Analysis started2022-05-24 07:45:29.954613
Analysis finished2022-05-24 07:46:52.223032
Duration1 minute and 22.27 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Store
Real number (ℝ≥0)

Distinct1115
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean558.4297268
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:52.306876image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1280
median558
Q3838
95-th percentile1060
Maximum1115
Range1114
Interquartile range (IQR)558

Descriptive statistics

Standard deviation321.9086511
Coefficient of variation (CV)0.5764532862
Kurtosis-1.200523741
Mean558.4297268
Median Absolute Deviation (MAD)279
Skewness-0.000954879981
Sum568039744
Variance103625.1797
MonotonicityIncreasing
2022-05-24T10:46:52.423018image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1942
 
0.1%
726942
 
0.1%
708942
 
0.1%
709942
 
0.1%
713942
 
0.1%
714942
 
0.1%
715942
 
0.1%
717942
 
0.1%
718942
 
0.1%
720942
 
0.1%
Other values (1105)1007789
99.1%
ValueCountFrequency (%)
1942
0.1%
2942
0.1%
3942
0.1%
4942
0.1%
5942
0.1%
6942
0.1%
7942
0.1%
8942
0.1%
9942
0.1%
10942
0.1%
ValueCountFrequency (%)
1115942
0.1%
1114942
0.1%
1113942
0.1%
1112942
0.1%
1111942
0.1%
1110942
0.1%
1109758
0.1%
1108942
0.1%
1107758
0.1%
1106942
0.1%

DayOfWeek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.998340557
Minimum0
Maximum6
Zeros144730
Zeros (%)14.2%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:52.507830image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.997390965
Coefficient of variation (CV)0.6661654761
Kurtosis-1.246873339
Mean2.998340557
Median Absolute Deviation (MAD)2
Skewness0.001592822804
Sum3049939
Variance3.989570667
MonotonicityNot monotonic
2022-05-24T10:46:52.579862image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
4145845
14.3%
3145845
14.3%
2145665
14.3%
1145664
14.3%
0144730
14.2%
6144730
14.2%
5144730
14.2%
ValueCountFrequency (%)
0144730
14.2%
1145664
14.3%
2145665
14.3%
3145845
14.3%
4145845
14.3%
5144730
14.2%
6144730
14.2%
ValueCountFrequency (%)
6144730
14.2%
5144730
14.2%
4145845
14.3%
3145845
14.3%
2145665
14.3%
1145664
14.3%
0144730
14.2%

Date
Categorical

HIGH CARDINALITY

Distinct942
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
2015-07-31
 
1115
2013-11-06
 
1115
2013-11-18
 
1115
2013-11-17
 
1115
2013-11-16
 
1115
Other values (937)
1011634 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters10172090
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015-07-31
2nd row2015-07-30
3rd row2015-07-29
4th row2015-07-28
5th row2015-07-27

Common Values

ValueCountFrequency (%)
2015-07-311115
 
0.1%
2013-11-061115
 
0.1%
2013-11-181115
 
0.1%
2013-11-171115
 
0.1%
2013-11-161115
 
0.1%
2013-11-151115
 
0.1%
2013-11-141115
 
0.1%
2013-11-131115
 
0.1%
2013-11-121115
 
0.1%
2013-11-111115
 
0.1%
Other values (932)1006059
98.9%

Length

2022-05-24T10:46:52.654391image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2015-07-311115
 
0.1%
2015-05-131115
 
0.1%
2015-03-221115
 
0.1%
2015-03-231115
 
0.1%
2015-03-241115
 
0.1%
2015-03-251115
 
0.1%
2015-03-261115
 
0.1%
2015-03-271115
 
0.1%
2015-04-151115
 
0.1%
2015-04-171115
 
0.1%
Other values (932)1006059
98.9%

Most occurring characters

ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8137672
80.0%
Dash Punctuation2034418
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02307842
28.4%
11825657
22.4%
21606379
19.7%
3660614
 
8.1%
4574660
 
7.1%
5440530
 
5.4%
6200805
 
2.5%
7198570
 
2.4%
8164005
 
2.0%
9158610
 
1.9%
Dash Punctuation
ValueCountFrequency (%)
-2034418
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common10172090
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII10172090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02307842
22.7%
-2034418
20.0%
11825657
17.9%
21606379
15.8%
3660614
 
6.5%
4574660
 
5.6%
5440530
 
4.3%
6200805
 
2.0%
7198570
 
2.0%
8164005
 
1.6%

Sales
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21734
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5773.818972
Minimum0
Maximum41551
Zeros172871
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:52.732069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13727
median5744
Q37856
95-th percentile12137
Maximum41551
Range41551
Interquartile range (IQR)4129

Descriptive statistics

Standard deviation3849.926175
Coefficient of variation (CV)0.6667902464
Kurtosis1.778374747
Mean5773.818972
Median Absolute Deviation (MAD)2067
Skewness0.6414596158
Sum5873180623
Variance14821931.55
MonotonicityNot monotonic
2022-05-24T10:46:52.821658image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0172871
 
17.0%
5674215
 
< 0.1%
5558197
 
< 0.1%
5483196
 
< 0.1%
6214195
 
< 0.1%
6049195
 
< 0.1%
5723194
 
< 0.1%
5449192
 
< 0.1%
5140191
 
< 0.1%
5489191
 
< 0.1%
Other values (21724)842572
82.8%
ValueCountFrequency (%)
0172871
17.0%
461
 
< 0.1%
1241
 
< 0.1%
1331
 
< 0.1%
2861
 
< 0.1%
2971
 
< 0.1%
3161
 
< 0.1%
4161
 
< 0.1%
5061
 
< 0.1%
5201
 
< 0.1%
ValueCountFrequency (%)
415511
< 0.1%
387221
< 0.1%
384841
< 0.1%
383671
< 0.1%
380371
< 0.1%
380251
< 0.1%
376461
< 0.1%
374031
< 0.1%
373761
< 0.1%
371221
< 0.1%

Customers
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct4086
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean633.1459464
Minimum0
Maximum7388
Zeros172869
Zeros (%)17.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:52.910109image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1405
median609
Q3837
95-th percentile1362
Maximum7388
Range7388
Interquartile range (IQR)432

Descriptive statistics

Standard deviation464.4117339
Coefficient of variation (CV)0.7334987083
Kurtosis7.091772718
Mean633.1459464
Median Absolute Deviation (MAD)216
Skewness1.59865029
Sum644041755
Variance215678.2586
MonotonicityNot monotonic
2022-05-24T10:46:53.000895image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0172869
 
17.0%
5602414
 
0.2%
5762363
 
0.2%
6032337
 
0.2%
5712330
 
0.2%
5552328
 
0.2%
5662327
 
0.2%
5172326
 
0.2%
5392309
 
0.2%
6512299
 
0.2%
Other values (4076)823307
80.9%
ValueCountFrequency (%)
0172869
17.0%
31
 
< 0.1%
51
 
< 0.1%
81
 
< 0.1%
131
 
< 0.1%
181
 
< 0.1%
361
 
< 0.1%
401
 
< 0.1%
441
 
< 0.1%
501
 
< 0.1%
ValueCountFrequency (%)
73881
< 0.1%
54941
< 0.1%
54581
< 0.1%
53871
< 0.1%
52971
< 0.1%
51921
< 0.1%
51521
< 0.1%
51451
< 0.1%
51321
< 0.1%
51121
< 0.1%

Open
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
1
844392 
0
172817 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Length

2022-05-24T10:46:53.087177image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.174159image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring characters

ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1844392
83.0%
0172817
 
17.0%

Promo
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
629129 
1
388080 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Length

2022-05-24T10:46:53.236165image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.316383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring characters

ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0629129
61.8%
1388080
38.2%

StateHoliday
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
986159 
a
 
20260
b
 
6690
c
 
4100

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Length

2022-05-24T10:46:53.426591image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.521857image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number986159
96.9%
Lowercase Letter31050
 
3.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a20260
65.2%
b6690
 
21.5%
c4100
 
13.2%
Decimal Number
ValueCountFrequency (%)
0986159
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common986159
96.9%
Latin31050
 
3.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a20260
65.2%
b6690
 
21.5%
c4100
 
13.2%
Common
ValueCountFrequency (%)
0986159
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0986159
96.9%
a20260
 
2.0%
b6690
 
0.7%
c4100
 
0.4%

SchoolHoliday
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
0
835488 
1
181721 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Length

2022-05-24T10:46:53.595982image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.663982image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring characters

ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0835488
82.1%
1181721
 
17.9%

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
a
551627 
d
312912 
c
136840 
b
 
15830

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowc
3rd rowc
4th rowc
5th rowc

Common Values

ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Length

2022-05-24T10:46:53.726487image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.797071image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring characters

ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Latin1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a551627
54.2%
d312912
30.8%
c136840
 
13.5%
b15830
 
1.6%

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
a
537445 
c
471470 
b
 
8294

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowa
5th rowa

Common Values

ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Length

2022-05-24T10:46:53.863733image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:53.935653image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring characters

ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1017209
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin1017209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a537445
52.8%
c471470
46.3%
b8294
 
0.8%

CompetitionDistance
Real number (ℝ≥0)

Distinct654
Distinct (%)0.1%
Missing2642
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean5430.085652
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:54.011228image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2330
Q36890
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6180

Descriptive statistics

Standard deviation7715.3237
Coefficient of variation (CV)1.420847514
Kurtosis13.00002236
Mean5430.085652
Median Absolute Deviation (MAD)1980
Skewness2.928534017
Sum5509185710
Variance59526219.8
MonotonicityNot monotonic
2022-05-24T10:46:54.105480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25011120
 
1.1%
507536
 
0.7%
3507536
 
0.7%
12007374
 
0.7%
1907352
 
0.7%
1806594
 
0.6%
906594
 
0.6%
3306410
 
0.6%
1506226
 
0.6%
26405652
 
0.6%
Other values (644)942173
92.6%
ValueCountFrequency (%)
20942
 
0.1%
303767
0.4%
404710
0.5%
507536
0.7%
602826
 
0.3%
704526
0.4%
802826
 
0.3%
906594
0.6%
1004710
0.5%
1105468
0.5%
ValueCountFrequency (%)
75860942
0.1%
58260942
0.1%
48330942
0.1%
46590942
0.1%
45740942
0.1%
44320942
0.1%
40860942
0.1%
40540942
0.1%
38710942
0.1%
38630942
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean7.222865963
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:54.190390image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.211832113
Coefficient of variation (CV)0.4446755803
Kurtosis-1.248357036
Mean7.222865963
Median Absolute Deviation (MAD)3
Skewness-0.1698616346
Sum5011665
Variance10.31586553
MonotonicityNot monotonic
2022-05-24T10:46:54.257500image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9114254
 
11.2%
487076
 
8.6%
1184455
 
8.3%
363548
 
6.2%
759434
 
5.8%
1257896
 
5.7%
1055622
 
5.5%
645444
 
4.5%
539608
 
3.9%
237886
 
3.7%
Other values (2)48638
 
4.8%
(Missing)323348
31.8%
ValueCountFrequency (%)
112452
 
1.2%
237886
 
3.7%
363548
6.2%
487076
8.6%
539608
 
3.9%
645444
 
4.5%
759434
5.8%
836186
 
3.6%
9114254
11.2%
1055622
5.5%
ValueCountFrequency (%)
1257896
5.7%
1184455
8.3%
1055622
5.5%
9114254
11.2%
836186
 
3.6%
759434
5.8%
645444
 
4.5%
539608
 
3.9%
487076
8.6%
363548
6.2%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing323348
Missing (%)31.8%
Infinite0
Infinite (%)0.0%
Mean2008.690228
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:54.323741image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.992644444
Coefficient of variation (CV)0.002983359187
Kurtosis121.934675
Mean2008.690228
Median Absolute Deviation (MAD)3
Skewness-7.539514879
Sum1393751810
Variance35.91178743
MonotonicityNot monotonic
2022-05-24T10:46:54.402180image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201375426
 
7.4%
201274299
 
7.3%
201463732
 
6.3%
200556564
 
5.6%
201051258
 
5.0%
201149396
 
4.9%
200949396
 
4.9%
200848476
 
4.8%
200743744
 
4.3%
200642802
 
4.2%
Other values (13)138768
13.6%
(Missing)323348
31.8%
ValueCountFrequency (%)
1900758
 
0.1%
1961942
 
0.1%
19904710
 
0.5%
19941884
 
0.2%
19951700
 
0.2%
1998942
 
0.1%
19997352
 
0.7%
20009236
 
0.9%
200114704
1.4%
200224882
2.4%
ValueCountFrequency (%)
201535060
3.4%
201463732
6.3%
201375426
7.4%
201274299
7.3%
201149396
4.9%
201051258
5.0%
200949396
4.9%
200848476
4.8%
200743744
4.3%
200642802
4.2%

Promo2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
1
509178 
0
508031 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1017209
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Length

2022-05-24T10:46:54.485543image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:54.553553image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring characters

ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1017209
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring scripts

ValueCountFrequency (%)
Common1017209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1017209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1509178
50.1%
0508031
49.9%

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean23.26909254
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:54.614122image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.09597253
Coefficient of variation (CV)0.6057809305
Kurtosis-1.369928605
Mean23.26909254
Median Absolute Deviation (MAD)13
Skewness0.1045275226
Sum11848110
Variance198.6964415
MonotonicityNot monotonic
2022-05-24T10:46:55.032424image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1472990
 
7.2%
4062598
 
6.2%
3139976
 
3.9%
1038828
 
3.8%
535818
 
3.5%
3732786
 
3.2%
132418
 
3.2%
1329820
 
2.9%
4529268
 
2.9%
2228694
 
2.8%
Other values (14)105982
 
10.4%
(Missing)508031
49.9%
ValueCountFrequency (%)
132418
3.2%
535818
3.5%
6942
 
0.1%
912452
 
1.2%
1038828
3.8%
1329820
2.9%
1472990
7.2%
1827318
 
2.7%
2228694
 
2.8%
234342
 
0.4%
ValueCountFrequency (%)
50942
 
0.1%
49758
 
0.1%
488294
 
0.8%
4529268
2.9%
442642
 
0.3%
4062598
6.2%
394732
 
0.5%
3732786
3.2%
369236
 
0.9%
3522814
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Infinite0
Infinite (%)0.0%
Mean2011.752774
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:55.100083image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.662870431
Coefficient of variation (CV)0.0008265779235
Kurtosis-1.04066228
Mean2011.752774
Median Absolute Deviation (MAD)1
Skewness-0.1200599167
Sum1024340254
Variance2.765138069
MonotonicityNot monotonic
2022-05-24T10:46:55.161507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2011115056
 
11.3%
2013110464
 
10.9%
201479922
 
7.9%
201273174
 
7.2%
200965270
 
6.4%
201056240
 
5.5%
20159052
 
0.9%
(Missing)508031
49.9%
ValueCountFrequency (%)
200965270
6.4%
201056240
5.5%
2011115056
11.3%
201273174
7.2%
2013110464
10.9%
201479922
7.9%
20159052
 
0.9%
ValueCountFrequency (%)
20159052
 
0.9%
201479922
7.9%
2013110464
10.9%
201273174
7.2%
2011115056
11.3%
201056240
5.5%
200965270
6.4%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing508031
Missing (%)49.9%
Memory size7.8 MiB
Jan,Apr,Jul,Oct
293122 
Feb,May,Aug,Nov
118596 
Mar,Jun,Sept,Dec
97460 

Length

Max length16
Median length15
Mean length15.19140654
Min length15

Characters and Unicode

Total characters7735130
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJan,Apr,Jul,Oct
2nd rowJan,Apr,Jul,Oct
3rd rowJan,Apr,Jul,Oct
4th rowJan,Apr,Jul,Oct
5th rowJan,Apr,Jul,Oct

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct293122
28.8%
Feb,May,Aug,Nov118596
 
11.7%
Mar,Jun,Sept,Dec97460
 
9.6%
(Missing)508031
49.9%

Length

2022-05-24T10:46:55.238163image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:55.313515image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct293122
57.6%
feb,may,aug,nov118596
23.3%
mar,jun,sept,dec97460
 
19.1%

Most occurring characters

ValueCountFrequency (%)
,1527534
19.7%
J683704
 
8.8%
u509178
 
6.6%
a509178
 
6.6%
A411718
 
5.3%
c390582
 
5.0%
t390582
 
5.0%
r390582
 
5.0%
p390582
 
5.0%
n390582
 
5.0%
Other values (13)2140908
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4170884
53.9%
Uppercase Letter2036712
26.3%
Other Punctuation1527534
 
19.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u509178
12.2%
a509178
12.2%
c390582
9.4%
t390582
9.4%
r390582
9.4%
p390582
9.4%
n390582
9.4%
e313516
7.5%
l293122
7.0%
b118596
 
2.8%
Other values (4)474384
11.4%
Uppercase Letter
ValueCountFrequency (%)
J683704
33.6%
A411718
20.2%
O293122
14.4%
M216056
 
10.6%
F118596
 
5.8%
N118596
 
5.8%
S97460
 
4.8%
D97460
 
4.8%
Other Punctuation
ValueCountFrequency (%)
,1527534
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6207596
80.3%
Common1527534
 
19.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
J683704
 
11.0%
u509178
 
8.2%
a509178
 
8.2%
A411718
 
6.6%
c390582
 
6.3%
t390582
 
6.3%
r390582
 
6.3%
p390582
 
6.3%
n390582
 
6.3%
e313516
 
5.1%
Other values (12)1827392
29.4%
Common
ValueCountFrequency (%)
,1527534
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7735130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
,1527534
19.7%
J683704
 
8.8%
u509178
 
6.6%
a509178
 
6.6%
A411718
 
5.3%
c390582
 
5.0%
t390582
 
5.0%
r390582
 
5.0%
p390582
 
5.0%
n390582
 
5.0%
Other values (13)2140908
27.7%

Year
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.8 MiB
2013
406974 
2014
373855 
2015
236380 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters4068836
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015

Common Values

ValueCountFrequency (%)
2013406974
40.0%
2014373855
36.8%
2015236380
23.2%

Length

2022-05-24T10:46:55.381011image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T10:46:55.453883image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2013406974
40.0%
2014373855
36.8%
2015236380
23.2%

Most occurring characters

ValueCountFrequency (%)
21017209
25.0%
01017209
25.0%
11017209
25.0%
3406974
10.0%
4373855
 
9.2%
5236380
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4068836
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21017209
25.0%
01017209
25.0%
11017209
25.0%
3406974
10.0%
4373855
 
9.2%
5236380
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Common4068836
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21017209
25.0%
01017209
25.0%
11017209
25.0%
3406974
10.0%
4373855
 
9.2%
5236380
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII4068836
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21017209
25.0%
01017209
25.0%
11017209
25.0%
3406974
10.0%
4373855
 
9.2%
5236380
 
5.8%

Month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.846762072
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:55.516726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.326096562
Coefficient of variation (CV)0.5688783845
Kurtosis-1.017876008
Mean5.846762072
Median Absolute Deviation (MAD)3
Skewness0.2742016429
Sum5947379
Variance11.06291834
MonotonicityNot monotonic
2022-05-24T10:46:55.580698image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
5103695
10.2%
3103695
10.2%
1103694
10.2%
6100350
9.9%
4100350
9.9%
798115
9.6%
293660
9.2%
1263550
6.2%
1063550
6.2%
863550
6.2%
Other values (2)123000
12.1%
ValueCountFrequency (%)
1103694
10.2%
293660
9.2%
3103695
10.2%
4100350
9.9%
5103695
10.2%
6100350
9.9%
798115
9.6%
863550
6.2%
961500
6.0%
1063550
6.2%
ValueCountFrequency (%)
1263550
6.2%
1161500
6.0%
1063550
6.2%
961500
6.0%
863550
6.2%
798115
9.6%
6100350
9.9%
5103695
10.2%
4100350
9.9%
3103695
10.2%

Day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.70278969
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.8 MiB
2022-05-24T10:46:55.653281image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.787637613
Coefficient of variation (CV)0.5596227031
Kurtosis-1.192005785
Mean15.70278969
Median Absolute Deviation (MAD)8
Skewness0.008454085266
Sum15973019
Variance77.22257483
MonotonicityNot monotonic
2022-05-24T10:46:55.729190image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
1633485
 
3.3%
1733485
 
3.3%
633485
 
3.3%
733485
 
3.3%
833485
 
3.3%
933485
 
3.3%
1033485
 
3.3%
1133485
 
3.3%
1233485
 
3.3%
1333485
 
3.3%
Other values (21)682359
67.1%
ValueCountFrequency (%)
133484
3.3%
233485
3.3%
333485
3.3%
433485
3.3%
533485
3.3%
633485
3.3%
733485
3.3%
833485
3.3%
933485
3.3%
1033485
3.3%
ValueCountFrequency (%)
3119350
1.9%
3030140
3.0%
2930140
3.0%
2833485
3.3%
2733485
3.3%
2633485
3.3%
2533485
3.3%
2433485
3.3%
2333485
3.3%
2233485
3.3%

Interactions

2022-05-24T10:46:45.816922image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:23.133664image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:25.877383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:28.114724image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:30.497338image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:32.945522image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:35.510714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:37.633288image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:39.632556image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:41.877185image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:43.659611image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:46.027170image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:23.359972image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:26.109372image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:28.365999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:30.730924image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:33.206243image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:35.682768image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:37.850834image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:39.824335image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.057876image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:43.877400image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:46.240830image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:23.598646image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:26.338640image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:28.614377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:30.971034image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:33.474753image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:35.859031image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.053711image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:40.007951image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.236177image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:44.099115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:46.449139image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:23.837748image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:26.556990image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:28.874835image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:31.191206image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:33.710278image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:36.101791image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.235306image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:40.501923image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.398163image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:44.313289image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:46.665481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:24.075156image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:26.779451image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:29.122892image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:31.426788image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:33.947511image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:36.274438image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.420813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:40.698401image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.553428image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:44.525129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:46.841582image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:24.300025image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:26.964992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:29.309983image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:31.637281image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:34.155430image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:36.453980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.612146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:40.846435image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.683480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:44.705822image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:47.021334image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:24.492501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:27.136601image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:29.497769image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:31.834411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:34.344959image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:36.664708image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.838694image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:40.993930image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.820752image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:44.872888image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:47.191803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:24.653587image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:27.291555image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:29.661005image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:32.023514image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:34.548559image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:36.844961image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:38.988070image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:41.205731image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:42.975844image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:45.026942image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:47.340939image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:24.820568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:27.453512image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:29.812916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:32.184119image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:34.743395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:37.043675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:39.118005image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:41.371028image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:43.124330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:45.186719image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:47.569313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:25.049363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:27.677564image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:30.038749image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:32.403864image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:35.078998image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:37.237283image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:39.289237image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:41.546106image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:43.277888image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:45.397856image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:47.781363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:25.279108image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:27.889983image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:30.274511image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:32.676825image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:35.331141image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:37.430924image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:39.461103image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:41.724290image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:43.432449image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-05-24T10:46:45.603714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-05-24T10:46:55.828371image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-24T10:46:55.967326image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-24T10:46:56.107423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-24T10:46:56.236552image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-24T10:46:56.348650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-24T10:46:48.114088image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-24T10:46:49.188094image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-24T10:46:51.113517image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-24T10:46:51.615249image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoIntervalYearMonthDay
0142015-07-3152635551101ca1270.09.02008.00NaNNaNNaN2015731
1132015-07-3050205461101ca1270.09.02008.00NaNNaNNaN2015730
2122015-07-2947825231101ca1270.09.02008.00NaNNaNNaN2015729
3112015-07-2850115601101ca1270.09.02008.00NaNNaNNaN2015728
4102015-07-2761026121101ca1270.09.02008.00NaNNaNNaN2015727
5162015-07-26000000ca1270.09.02008.00NaNNaNNaN2015726
6152015-07-2543645001000ca1270.09.02008.00NaNNaNNaN2015725
7142015-07-2437064591000ca1270.09.02008.00NaNNaNNaN2015724
8132015-07-2337695031000ca1270.09.02008.00NaNNaNNaN2015723
9122015-07-2234644631000ca1270.09.02008.00NaNNaNNaN2015722

Last rows

StoreDayOfWeekDateSalesCustomersOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoIntervalYearMonthDay
1017199111532013-01-1050073391101dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec2013110
1017200111522013-01-0946493241101dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201319
1017201111512013-01-0852433411101dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201318
1017202111502013-01-0769054711101dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201317
1017203111562013-01-06000001dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201316
1017204111552013-01-0547713391001dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201315
1017205111542013-01-0445403261001dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201314
1017206111532013-01-0342973001001dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201313
1017207111522013-01-0236973051001dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201312
1017208111512013-01-010000a1dc5350.0NaNNaN122.02012.0Mar,Jun,Sept,Dec201311